# Load spatial data
pa_counties <- st_read("data/Pennsylvania_County_Boundaries.shp")
hospitals <- st_read("data/hospitals.geojson")
census_tracts <- tracts(state = "PA", cb = TRUE)
# Check that all data loaded correctly
glimpse(pa_counties)
glimpse(census_tracts)
glimpse(hospitals)Assignment 2: Spatial Analysis and Visualization
Healthcare Access and Equity in Pennsylvania
Assignment Overview
Learning Objectives:
- Apply spatial operations to answer policy-relevant research questions
- Integrate census demographic data with spatial analysis
- Create publication-quality visualizations and maps
- Work with spatial data from multiple sources
- Communicate findings effectively for policy audiences
Part 1: Healthcare Access for Vulnerable Populations
Research Question
Which Pennsylvania counties have the highest proportion of vulnerable populations (elderly + low-income) living far from hospitals?
Your analysis should identify counties that should be priorities for healthcare investment and policy intervention.
Required Analysis Steps
Complete the following analysis, documenting each step with code and brief explanations:
Step 1: Data Collection (5 points)
Load the required spatial data:
- Pennsylvania county boundaries
- Pennsylvania hospitals (from lecture data)
- Pennsylvania census tracts
Your Task:
census_tracts <- st_transform(census_tracts, st_crs(pa_counties))
hospitals <- st_transform(hospitals, st_crs(pa_counties))
# With ggplot2
p1 <- ggplot(pa_counties) +
geom_sf() +
theme_void()
p2 <- ggplot(census_tracts) +
geom_sf() +
theme_void()
p3 <- ggplot(hospitals) +
geom_sf() +
theme_void()
p1 | p2 | p3Questions to answer:
- How many hospitals are in your dataset?
- 223
- How many census tracts?
- 3445
- What coordinate reference system is each dataset in?
- I transformed hospitals and census tracts to PA counties CRS, which is WGS84. Before that, hospitals were also in WGS84, but census tracts were in NAD83.
Step 2: Get Demographic Data
Use tidycensus to download tract-level demographic data for Pennsylvania.
Required variables:
- Total population
- Median household income
- Population 65 years and over (you may need to sum multiple age categories)
Your Task:
# Get demographic data from ACS
pa_tracts_data <- get_acs(
geography = "tract",
variables = c(
median_income = "B19013_001",
total_pop = "B01003_001",
over_65 = "B01001_020" # Population 65 years and over
),
state = "PA",
year = 2023,
output = "wide"
)
# Join to tract boundaries
census_tracts <- census_tracts %>%
left_join(pa_tracts_data, by= "GEOID")
summary(census_tracts)Questions to answer:
- What year of ACS data are you using?
- 2019-2023 5-year ACS data
- How many tracts have missing income data?
- 65
- What is the median income across all PA census tracts?
- $72,944
Step 3: Define Vulnerable Populations
Identify census tracts with vulnerable populations based on TWO criteria:
- Low median household income (choose an appropriate threshold)
- Significant elderly population (choose an appropriate threshold)
Your Task:
# Filter for vulnerable tracts based on your criteria
vulnerable_tracts <- census_tracts %>%
filter(median_incomeE < 42398 | (over_65E/total_popE) > .017)Questions to answer:
- What income threshold did you choose and why?
- I looked at the 2023 poverty guidelines, the average household size in PA in 2023, which was 2.4, and then looked into eligibility for various programs, which is anywhere from 133% FPIG to 215%. I am going with vulnerable household is 215% or less of the 2-person household FPIG, or $42,398.
- What elderly population threshold did you choose and why?
- In order to control for overall population size, instead of looking at the raw number of elderly, I looked at the over 65 as percent of total population. I defined vulnerable as any tract in the 75th percentile of elderly percent of tract population (1.7% or above).
- How many tracts meet your vulnerability criteria?
- 1,145
- What percentage of PA census tracts are considered vulnerable by your definition?
- 33.2% – about a third.
Step 4: Calculate Distance to Hospitals
For each vulnerable tract, calculate the distance to the nearest hospital.
Your Task:
#convert to Albers
vulnerable_tracts <- st_transform(vulnerable_tracts, crs = 5070)
hospitals <- st_transform(hospitals, crs = 5070)
# Calculate distance from each tract centroid to nearest hospital
tract_centroids <- st_centroid(vulnerable_tracts)
nearest_hospital <- st_nearest_feature(tract_centroids, hospitals)
vulnerable_tracts <- vulnerable_tracts %>%
mutate(nearest_hospital_geom = hospitals$geometry[nearest_hospital])
vulnerable_tracts <- vulnerable_tracts %>%
mutate(distance_to_nearst_hospital = set_units(st_distance(tract_centroids, vulnerable_tracts$nearest_hospital_geom, by_element = TRUE), "mi"))
summary(vulnerable_tracts$distance_to_nearst_hospital) Min. 1st Qu. Median Mean 3rd Qu. Max.
0.02107 1.21850 2.75761 4.37494 6.02931 29.75156
Requirements:
- Use an appropriate projected coordinate system for Pennsylvania
- Calculate distances in miles
- Explain why you chose your projection
I chose Albers Conical Equal Area, NAD 83 projection based on this PA document from 2011’s recommendation.
Questions to answer:
- What is the average distance to the nearest hospital for vulnerable tracts?
- 4.4 miles
- What is the maximum distance?
- 29.8 miles
- How many vulnerable tracts are more than 15 miles from the nearest hospital?
- 33
Step 5: Identify Underserved Areas
Define “underserved” as vulnerable tracts that are more than 15 miles from the nearest hospital.
Your Task:
# Create underserved variable
vulnerable_tracts <- vulnerable_tracts %>%
mutate(underserved = ifelse(as.numeric(distance_to_nearst_hospital) > 15,"Underserved",""))
vulnerable_tracts %>%
group_by(underserved)%>%
summarise(n())Questions to answer:
- How many tracts are under-served?
- 33
- What percentage of vulnerable tracts are under-served?
- 3%
- Does this surprise you? Why or why not?
- Unfortunately, this does not surprise me, given how rural certain areas of Pennsylvania are and knowing how rural healthcare is a persistent issue in the US in general. However, it would surprise me if some of these tracts are in urban areas.
Step 6: Aggregate to County Level
Use spatial joins and aggregation to calculate county-level statistics about vulnerable populations and hospital access.
Your Task:
# Spatial join tracts to counties
pa_counties <- pa_counties %>% st_transform(st_crs(vulnerable_tracts))
vulnerable_per_county <- vulnerable_tracts %>%
st_join(pa_counties) %>%
st_drop_geometry()
# Aggregate statistics by county
vulnerable_per_county_stats <- vulnerable_per_county %>%
group_by(COUNTY_NAM) %>%
summarise(
num_vulnerable_tracts = n(),
num_underserved_tracts = sum(underserved == "Underserved"),
percent_underserved = sum(underserved == "Underserved")/n(),
avg_distance_to_nearst_hospital = mean(distance_to_nearst_hospital),
total_pop = sum(total_popE),
total_underserved_pop = sum(ifelse(underserved == "Underserved", total_popE, 0))
)Required county-level statistics:
- Number of vulnerable tracts
- Number of underserved tracts
- Percentage of vulnerable tracts that are underserved
- Average distance to nearest hospital for vulnerable tracts
- Total vulnerable population
Questions to answer:
- Which 5 counties have the highest percentage of underserved vulnerable tracts?
- Cameron
- Juniata
- Potter
- Snyder
- Sullivan
- Which counties have the most vulnerable people living far from hospitals?
- Clearfield
- Chester
- Juniata
- Snyder
- Pike
- Are there any patterns in where underserved counties are located?
- There are a lot of quite rural underserved counties, like Sullivan, Juniata, and Clearfield, but there are also suburban counties of major PA cities like Chester County outside of Philadelphia and Cumberland County outside Harrisburg.
Step 7: Create Summary Table
Create a professional table showing the top 10 priority counties for healthcare investment.
Your Task:
# Create and format priority counties table
vulnerable_per_county_stats %>%
arrange(desc(total_underserved_pop)) %>%
slice_head(n = 10) %>%
select(
c("COUNTY_NAM", "total_underserved_pop", "percent_underserved", "avg_distance_to_nearst_hospital")
) %>%
mutate(
avg_distance_to_nearst_hospital = as.numeric(avg_distance_to_nearst_hospital),
percent_underserved = paste0(round(percent_underserved*100, 2), "%")
)%>%
kable(
col.names = c("County", "Total Underserved Population", "Percent of Vulnerable Tracts that are Underserved", "Average Distance of to Nearest Hospital"),
digit = 1,
format.args = list(big.mark = ","),
align = "l",
caption = "10 Counties in PA with the Highest Absolute Population more than 15 miles from Nearest Hospital"
)| County | Total Underserved Population | Percent of Vulnerable Tracts that are Underserved | Average Distance of to Nearest Hospital |
|---|---|---|---|
| CLEARFIELD | 17,027 | 28.57% | 12.4 |
| CHESTER | 15,467 | 7.5% | 6.1 |
| JUNIATA | 13,955 | 50% | 15.4 |
| SNYDER | 12,073 | 50% | 15.1 |
| PIKE | 10,292 | 35.29% | 15.4 |
| DAUPHIN | 8,815 | 8.33% | 5.3 |
| SCHUYLKILL | 8,815 | 7.41% | 7.0 |
| PERRY | 8,761 | 22.22% | 12.3 |
| LANCASTER | 8,055 | 5.71% | 5.7 |
| CENTRE | 6,843 | 5.88% | 6.2 |
Requirements:
- Use
knitr::kable()or similar for formatting - Include descriptive column names
- Format numbers appropriately (commas for population, percentages, etc.)
- Add an informative caption
- Sort by priority (you decide the metric)
Part 2: Comprehensive Visualization
Using the skills from Week 3 (Data Visualization), create publication-quality maps and charts.
Map 1: County-Level Choropleth
Create a choropleth map showing healthcare access challenges at the county level.
Your Task:
# Create county-level access map
vulnerable_per_county_stats <- vulnerable_per_county_stats %>%
left_join(pa_counties, by="COUNTY_NAM")
vulnerable_per_county_stats <- vulnerable_per_county_stats %>%
st_as_sf() %>%
st_transform(st_crs(census_tracts)) %>%
mutate(percent_underserved = percent_underserved*100
)
hospitals <- hospitals %>%
mutate(type = "Hospital")
ggplot(vulnerable_per_county_stats) +
geom_sf(aes(fill = percent_underserved)) +
scale_fill_gradient(
low = "#C3CDFE",
high = "#485EFE",
name = "% Underserved Tracts"
) +
new_scale_fill()+
geom_sf(
data = hospitals,
aes(fill=type),
size = 3,
shape = 21,
color = "#FF8600",
stroke = 0.8
) +
scale_fill_manual(
values = c("Hospital" = "#FF8600"),
name = NULL
)+
labs(
title = "Underserved Tracts per Counties in Pennsylvania",
subtitle = str_wrap("Where an underserved tract is one where the center is at least 15 miles from the nearest hospital",50),
caption ="Data Sources: US Census Data, OpenDataPhilly"
) +
annotation_north_arrow(
location = "tr",
which_north = "true",
style = north_arrow_minimal
) +
theme_void() +
theme(
legend.position = "right",
legend.direction = "vertical",
plot.title = element_text(hjust = 0.5, size = 16, face = "bold"),
plot.subtitle = element_text(hjust = 0.5, size = 12),
plot.caption = element_text(hjust = 0.5, size = 9, margin = margin(t = 10))
)Requirements:
- Fill counties by percentage of vulnerable tracts that are underserved
- Include hospital locations as points
- Use an appropriate color scheme
- Include clear title, subtitle, and caption
- Use
theme_void()or similar clean theme - Add a legend with formatted labels
Map 2: Detailed Vulnerability Map
Create a map highlighting underserved vulnerable tracts.
Your Task:
# Create detailed tract-level map
vulnerable_tracts <- vulnerable_tracts %>%
st_as_sf() %>%
st_transform(st_crs(census_tracts))
underserved_tracts <- vulnerable_tracts[vulnerable_tracts$underserved=="Underserved",]
pa_counties$legend <- "County Boundary"
ggplot(census_tracts) +
geom_sf(
color="darkgrey",
alpha=.5,
size=.5
)+
geom_sf(
data=underserved_tracts,
aes(fill = underserved)
)+
geom_sf(
data=pa_counties,
aes(color=legend),
linewidth = .8,
fill=NA
)+
geom_sf(
data=hospitals,
aes(fill = type),
size = 1,
shape = 21,
color = "#FF8600",
stroke = 0.8
)+
scale_fill_manual(
values = c("Underserved" = "#485EFE",
"Hospital" = "#FF8600"),
name = NULL
) +
scale_color_manual(
values = c("County Boundary" = "black"),
name = NULL
) +
annotation_north_arrow(
location = "tr",
which_north = "true",
style = north_arrow_minimal
) +
theme_void()+
labs(
title = "Underserved Tracts in Pennsylvania",
subtitle = str_wrap("Where an underserved tract's center is at least 15 miles from the nearest hospital",50),
caption ="Data Sources: US Census Data, Pennsylvania Spatial Data Access"
) +
theme(
legend.position = "bottom",
legend.direction = "horizontal",
plot.title = element_text(hjust = 0.5, size = 16, face = "bold"),
plot.subtitle = element_text(hjust = 0.5, size = 12),
plot.caption = element_text(hjust = 0.5, size = 9, margin = margin(t = 10))
)Requirements:
- Show underserved vulnerable tracts in a contrasting color
- Include county boundaries for context
- Show hospital locations
- Use appropriate visual hierarchy (what should stand out?)
- Include informative title and subtitle
Chart: Distribution Analysis
Create a visualization showing the distribution of distances to hospitals for vulnerable populations.
Your Task:
# Create distribution visualization
#Load regions
dep_regions<- st_read("data/DEPRegions2024_03.shp")dep_regions <- dep_regions %>% st_transform(st_crs(vulnerable_tracts))
vulnerable_tracts <- vulnerable_tracts %>%
mutate(
urban_rural = case_when(
total_popE >= 5000 ~ "Urban",
TRUE ~ "Rural"
)
)
vulnerable_tracts_with_regions <- vulnerable_tracts %>%
st_join(dep_regions) %>%
st_drop_geometry()
vulnerable_tracts_with_regions <- vulnerable_tracts_with_regions %>%
mutate(
Region = case_when(
SNAME == "NCRO"~"North Central",
SNAME =="NERO"~"North East",
SNAME=="NWRO"~"North West",
SNAME=="SCRO"~"South Central",
SNAME=="SERO"~"South East",
SNAME=="SWRO"~"South West"
)
)
ggplot(vulnerable_tracts_with_regions)+
geom_boxplot(
aes(x=Region, y=distance_to_nearst_hospital)
)+
labs(
title="Distance of Vulnerable Tracts to Nearest Hospital by Region",
caption=str_wrap("Where a vulnerable tract is in the top 25% of concentration of elderly population or 215% of the Federal Poverty Line for 2-person household. Data sources: US Census, Pennsylvania Spatial Data Access, Department of Environmental Protection Regions",100),
y=str_wrap("Distance from Center of Tract to Nearest Hospital",50),
x="Region of Tract"
)+
theme(
legend.position = "bottom",
legend.direction = "horizontal",
plot.title = element_text(hjust = 0.5, size = 16, face = "bold"),
plot.subtitle = element_text(hjust = 0.5, size = 12),
plot.caption = element_text(hjust = 0.5, size = 9, margin = margin(t = 10))
)Northern Pennsylvania vulnerable populations generally are location farther from hospitals than Southern Pennsylvania, with North Central PA having the highest median and upper quartiles of distance to the nearest hospital of all 6 regions. However another trend we can see is that the highest number of outliers fall within the South East and South West regions, home to the two biggest cities in PA, Philadelphia and Pittsburgh, respectively. That shows that these regions have more tracts that are facing very different needs that the middle 50% of the region.
Suggested chart types:
- Histogram or density plot of distances
- Box plot comparing distances across regions
- Bar chart of underserved tracts by county
- Scatter plot of distance vs. vulnerable population size
Requirements:
- Clear axes labels with units
- Appropriate title
- Professional formatting
- Brief interpretation (1-2 sentences as a caption or in text)
Part 3: Bring Your Own Data Analysis
Choose your own additional spatial dataset and conduct a supplementary analysis.
Challenge Options
Choose ONE of the following challenge exercises, or propose your own research question using OpenDataPhilly data (https://opendataphilly.org/datasets/).
Your Analysis
Digital Justice
- Data: Census Broadband access, device access, and income, Philadelphia free wifi spots from OpenDataPhilly
- Question: “Do digitally disadvantaged neighborhoods have equitable access to city internet?”
- Operations: Buffer free wifi spots with a computer (10-minute walk = 0.5 mile), calculate connectivity by tract, determine digitally vulnerable tracts from census data and underserved tracts by distance from buffers.
- Policy relevance: Digital equity, broadband infrastructure, internet-connected device access
Find and load additional data
- Document your data source
- Check and standardize the CRS
- Provide basic summary statistics
# Load your additional dataset
v21 <- load_variables(2023, "acs5", cache = TRUE)
# Search for broadband and computer-related variables
bb_vars <- v21 %>% filter(str_detect(label, "broadband"))
comp_vars <-v21 %>% filter(str_detect(label, "computer"))
#B28003_002 = total with computer
#B28002_004 = total with broadband internet subscription
philly_digital_access <- get_acs(
geography = "tract",
variables = c(
total_pop = "B01003_001",
total_hh = "B28001_001",
has_computer = "B28003_002",
smartphone_only = "B28001_006",
has_broadband = "B28002_004"
),
state = "PA",
county = "Philadelphia",
year = 2023,
output = "wide"
)
free_wifi_spots <- st_read("data/free_city_wifi_locations.shp")%>%st_transform(2272)
philly_digital_access <- left_join(
philly_digital_access,
census_tracts,
by="GEOID"
) %>%
st_as_sf() %>%
st_transform(2272)
ggplot(philly_digital_access)+
geom_sf()+
geom_sf(
data=free_wifi_spots
)+
theme_void()+
labs(
title = "Free Wi-Fi Spots in Philadelphia",
caption ="Data Sources: US Census Data, OpenDataPhilly"
) +
annotation_north_arrow(
location = "br",
which_north = "true",
style = north_arrow_minimal
) +
theme_void() +
theme(
legend.position = "right",
legend.direction = "vertical",
plot.title = element_text(hjust = 0.5, size = 16, face = "bold"),
plot.subtitle = element_text(hjust = 0.5, size = 12),
plot.caption = element_text(hjust = 0.5, size = 9, margin = margin(t = 10))
)Questions to answer:
- What dataset did you choose and why?
- I choose the ACS variables showing whether a household has broadband access and whether they have a computer, as well as OpenDataPhilly’s database of free wi-fi locations, which inlcudes a variable showing how many public use computers are available at the location.
- What is the data source and date?
- The ACS data is 5-year data from 2018 - 2023, and the wi-fi locations are from 2024, so the home broadband and computer access data may be slightly outdated. Also important to note is 2018 - 2023 range includes the pandemic, a period in which there was unprecedented investment in home internet subsidies that have expired in 2024 Affordable Connectivity Program.
- How many features does it contain?
- There are 254 free-wifi locations, and 408 census tracts in Philadelphia.
- What CRS is it in? Did you need to transform it?
- Before transformation, both datasets were in web mercator, EPSG 3857, and since Philly falls within southern pennsylvania, I transformed it to Pennsylvania South state plane projection, EPSG 2272.
- Pose a research question
Do digitally vulnerable tracts have adequate access to free wifi spots with computers?
- Conduct spatial analysis
Use at least TWO spatial operations to answer your research question.
Required operations (choose 2+):
- Buffers
- Spatial joins
- Spatial filtering with predicates
- Distance calculations
- Intersections or unions
- Point-in-polygon aggregation
Your Task:
# Your spatial analysis
#Filter Philly census tracts by the most digitally vulnerable: in the top quartile of percent of tract that has no home broadband or top quartile of percent of tract with no computer.
philly_digital_access <- philly_digital_access %>%
mutate(
percent_no_bb = (1-has_broadbandE/total_hhE)*100,
percent_smart_only = (smartphone_onlyE/total_hhE)*100
)
summary(philly_digital_access$percent_no_bb) Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
0.000 5.499 10.956 12.334 17.514 37.186 20
summary(philly_digital_access$percent_smart_only) Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
0.000 4.570 9.456 10.801 15.765 38.915 20
#based on summary stats, the most digitally vulnerable are tract with 17.514% or more of households that do not have home broadband or 15.765% or more of households that do not have a smartphone only.
digit_vulnerable_philly <- philly_digital_access %>%
filter(percent_no_bb>=17.514 | percent_smart_only>=15.765 )%>%
mutate(Vulnerable="Digtally Vulnerable Tract")
#visually examine digitally vulnerable tracts and free wifi spots with computers
ggplot(philly_digital_access)+
geom_sf()+
geom_sf(
data=digit_vulnerable_philly,
aes(fill=Vulnerable)
)+
scale_fill_manual(
values=c("Digtally Vulnerable Tract" = "purple")
)+
geom_sf(data=free_wifi_spots[free_wifi_spots$computers_=="Y",])+
theme_void()+
labs(
title = str_wrap("Digitally Vulnerable Tracts and Free Wifi Spots with a Computer in Philadelphia",70),
subtitle = str_wrap("Where a vulnerable tract is one where 17.514% or more of households do not have home broadband or 15.765% or more of households do not have a computer",60),
caption ="Data Sources: US Census Data, OpenDataPhilly"
) +
annotation_north_arrow(
location = "br",
which_north = "true",
style = north_arrow_minimal
) +
theme_void() +
theme(
legend.position = "right",
legend.direction = "vertical",
plot.title = element_text(hjust = 0.5, size = 16, face = "bold"),
plot.subtitle = element_text(hjust = 0.5, size = 12),
plot.caption = element_text(hjust = 0.5, size = 9, margin = margin(t = 10))
)# Find nearest free wifi spots with a computer to tract centroids
digit_vulnerable_centroids <- st_centroid(digit_vulnerable_philly)
nearest_wifi_spot_comp <- st_nearest_feature(digit_vulnerable_centroids, free_wifi_spots[free_wifi_spots$computers_=="Y",])
digit_vulnerable_philly <- digit_vulnerable_philly %>%
mutate(nearest_wifi_spot_geom = free_wifi_spots$geometry[nearest_wifi_spot_comp])
digit_vulnerable_philly <- digit_vulnerable_philly %>%
mutate(distance_wifi_spot_comp = set_units(st_distance(digit_vulnerable_centroids, digit_vulnerable_philly$nearest_wifi_spot_geom, by_element = TRUE), "mi"))
summary(digit_vulnerable_philly$distance_wifi_spot_comp) Min. 1st Qu. Median Mean 3rd Qu. Max.
0.05125 2.82746 5.39764 5.61460 7.63815 15.05424
digit_vulnerable_philly%>%filter(
total_popE.x>400
)%>%summary() GEOID NAME total_popE.x total_popM.x
Length:139 Length:139 Min. : 892 Min. : 267.0
Class :character Class :character 1st Qu.:3182 1st Qu.: 624.5
Mode :character Mode :character Median :4128 Median : 822.0
Mean :4352 Mean : 875.7
3rd Qu.:5264 3rd Qu.:1073.5
Max. :8425 Max. :1943.0
total_hhE total_hhM has_computerE has_computerM smartphone_onlyE
Min. : 514 Min. : 51.0 Min. : 432 Min. : 60.0 Min. : 25.0
1st Qu.:1242 1st Qu.:201.0 1st Qu.:1126 1st Qu.:203.5 1st Qu.:191.5
Median :1686 Median :241.0 Median :1526 Median :256.0 Median :291.0
Mean :1740 Mean :278.4 Mean :1553 Mean :282.3 Mean :312.9
3rd Qu.:2153 3rd Qu.:323.5 3rd Qu.:1926 3rd Qu.:337.0 3rd Qu.:399.5
Max. :3713 Max. :738.0 Max. :3476 Max. :785.0 Max. :981.0
smartphone_onlyM has_broadbandE has_broadbandM STATEFP
Min. : 30.0 Min. : 390.0 Min. : 61.0 Length:139
1st Qu.:117.0 1st Qu.: 979.5 1st Qu.:205.0 Class :character
Median :166.0 Median :1289.0 Median :253.0 Mode :character
Mean :178.8 Mean :1388.0 Mean :273.7
3rd Qu.:217.5 3rd Qu.:1778.5 3rd Qu.:320.5
Max. :614.0 Max. :3234.0 Max. :793.0
COUNTYFP TRACTCE GEOIDFQ NAME.x
Length:139 Length:139 Length:139 Length:139
Class :character Class :character Class :character Class :character
Mode :character Mode :character Mode :character Mode :character
NAMELSAD STUSPS NAMELSADCO STATE_NAME
Length:139 Length:139 Length:139 Length:139
Class :character Class :character Class :character Class :character
Mode :character Mode :character Mode :character Mode :character
LSAD ALAND AWATER NAME.y
Length:139 Min. : 190517 Min. : 0 Length:139
Class :character 1st Qu.: 403434 1st Qu.: 0 Class :character
Mode :character Median : 555597 Median : 0 Mode :character
Mean : 637299 Mean : 2039
3rd Qu.: 763572 3rd Qu.: 0
Max. :2244712 Max. :133239
median_incomeE median_incomeM total_popE.y total_popM.y
Min. : 13721 Min. : 2116 Min. : 892 Min. : 267.0
1st Qu.: 32050 1st Qu.: 9380 1st Qu.:3182 1st Qu.: 624.5
Median : 41313 Median :13530 Median :4128 Median : 822.0
Mean : 42388 Mean :15511 Mean :4352 Mean : 875.7
3rd Qu.: 50068 3rd Qu.:20019 3rd Qu.:5264 3rd Qu.:1073.5
Max. :106420 Max. :41416 Max. :8425 Max. :1943.0
NA's :5 NA's :5
over_65E over_65M geometry percent_no_bb
Min. : 0.00 Min. : 2.00 MULTIPOLYGON :139 Min. : 1.00
1st Qu.: 8.00 1st Qu.: 16.00 epsg:2272 : 0 1st Qu.:14.91
Median : 22.00 Median : 29.00 +proj=lcc ...: 0 Median :20.23
Mean : 35.91 Mean : 43.04 Mean :20.17
3rd Qu.: 58.50 3rd Qu.: 58.00 3rd Qu.:26.27
Max. :197.00 Max. :210.00 Max. :37.19
percent_smart_only Vulnerable nearest_wifi_spot_geom
Min. : 1.508 Length:139 POINT :139
1st Qu.:13.633 Class :character epsg:2272 : 0
Median :17.844 Mode :character +proj=lcc ...: 0
Mean :18.398
3rd Qu.:22.327
Max. :38.915
distance_wifi_spot_comp
Min. : 0.05125
1st Qu.: 2.82746
Median : 5.39764
Mean : 5.61460
3rd Qu.: 7.63815
Max. :15.05424
#Find walking distance buffers around free wifi spots with computers
free_wifi_buffers <- free_wifi_spots %>%
filter(computers_=="Y" & to_display=="ACTIVE")%>%
st_buffer(dist = 2640) # 2640 ft = .5 mi
#dissolve overlapping buffers
free_wifi_buffers_dissolve <- free_wifi_buffers%>%
st_union() %>%
st_cast("POLYGON")%>%
st_as_sf()%>%
mutate(
legend= ".5 mi from Free Wifi Spot w Computer"
)
#examine buffers and tracts overlap
ggplot(philly_digital_access)+
geom_sf(
color="darkgray",
linewidth=.5,
alpha=.5
)+
geom_sf(
data=digit_vulnerable_philly,
aes(fill=percent_no_bb+percent_smart_only)
)+
scale_fill_gradient(
low="#FFCF33",
high="#F53D00",
name=str_wrap("Digital Vulnerability (higher number is more vulnerable)",30)
)+
new_scale_fill()+
geom_sf(
data=free_wifi_buffers_dissolve,
aes(fill=legend),
color="black",
linewidth=.8,
alpha=0.4
)+
scale_color_manual(
values = c(".5 mi from Free Wifi Spot w Computer" = "#F5A6E6") ,
name=NULL
)+
theme_void()+
labs(
title = str_wrap("Degree of Digital Vulnerability and Walking Distance to Public WiFi & Computer",50),
caption ="Data Sources: US Census Data, OpenDataPhilly"
) +
annotation_north_arrow(
location = "br",
which_north = "true",
style = north_arrow_minimal
) +
theme_void() +
guides(fill = guide_legend(title = NULL))+
theme(
legend.position = "right",
legend.direction = "vertical",
plot.title = element_text(hjust = 0.5, size = 16, face = "bold"),
plot.subtitle = element_text(hjust = 0.5, size = 12),
plot.caption = element_text(hjust = 0.5, size = 9, margin = margin(t = 10))
)#Define underserved tract as a tract that is digitally vulnerable and outside walking distance to a free wifi spot with a computer.
overlap <- st_intersects(digit_vulnerable_philly, free_wifi_buffers)
no_overlap <- lengths(overlap) == 0
digit_underserved <- digit_vulnerable_philly[no_overlap, ]
summary(digit_underserved)#what are characteristics of top 10 digitally underserved tracts by absolute number of those with no broadband + those with no computer?
#note, the sum of estiamated households without computer and without broadband is not an accurate estimate of reality because those without one are likely overlapping with the other. This metric is just being used to create a metric showing reflecting broadband and computer need in one.
digit_underserved<-digit_underserved%>%
mutate(
digit_need_heuristic = (total_hhE - smartphone_onlyE)+(total_hhE - has_broadbandE)
)
top_10_digit_underserved <- digit_underserved%>%
arrange(desc(digit_need_heuristic))%>%
slice_head(n = 10)
top_10_digit_underserved %>%
st_drop_geometry()%>%
select(c(
"NAMELSAD",
"percent_no_bb",
"percent_smart_only",
"median_incomeE",
"distance_wifi_spot_comp"
)) %>%
kable(
col.names = c("Census Tract",
"Percent w/o Broadband",
"Percent w/o Smartphone Only",
"Median Income",
"Distance to Free Wi-Fi spot w/ Computer"),
digits = 1,
format.args = list(big.mark = ","),
align = "l",
caption = "10 Census Tracts in Philadelphia that have Highest Digitally Vulnerable Population"
)| Census Tract | Percent w/o Broadband | Percent w/o Smartphone Only | Median Income | Distance to Free Wi-Fi spot w/ Computer |
|---|---|---|---|---|
| Census Tract 306 | 21.3 | 7.5 | 44,108 | 5.934038 [mi] |
| Census Tract 300 | 18.2 | 14.5 | 33,419 | 5.459912 [mi] |
| Census Tract 192 | 19.0 | 14.6 | 13,721 | 7.369100 [mi] |
| Census Tract 178 | 12.7 | 15.8 | 42,104 | 2.672626 [mi] |
| Census Tract 282 | 24.6 | 8.6 | 34,420 | 7.595321 [mi] |
| Census Tract 273 | 24.6 | 17.9 | 42,423 | 3.069729 [mi] |
| Census Tract 336 | 5.4 | 16.6 | 46,191 | 5.278315 [mi] |
| Census Tract 268 | 20.9 | 22.9 | 41,330 | 5.397637 [mi] |
| Census Tract 122.01 | 18.8 | 19.5 | 42,782 | 4.911781 [mi] |
| Census Tract 278 | 27.3 | 37.4 | 33,655 | 8.069892 [mi] |
#Map the underserved tracts and wifi spots buffers
digit_underserved<-digit_underserved%>%
mutate(
legend="Digitally Underserved Tract"
)
ggplot(philly_digital_access)+
geom_sf(
color="darkgray",
linewidth=.5,
alpha=.5
)+
geom_sf(
data=digit_underserved,
aes(fill="Digitally Underserved Tract")
)+
scale_fill_manual(
values = c("Digitally Underserved Tract" = "#3F88C5"),
name=NULL
)+
new_scale_fill()+
geom_sf(
data=free_wifi_buffers_dissolve,
aes(fill=".5 mi from Free Wifi Spot w Computer"),
color="black",
linewidth=.8,
alpha=0.4
)+
scale_color_manual(
values = c(".5 mi from Free Wifi Spot w Computer" = "#F5A6E6")
)+
guides(fill = guide_legend(title = NULL))+
theme_void()+
labs(
title = str_wrap("Digitally Underserved Tracts in Philadelphia ",40),
caption ="Data Sources: US Census Data, OpenDataPhilly"
) +
annotation_north_arrow(
location = "br",
which_north = "true",
style = north_arrow_minimal
) +
theme_void() +
guides(fill = guide_legend(title = NULL))+
theme(
legend.position = "right",
legend.direction = "vertical",
plot.title = element_text(hjust = 0.5, size = 16, face = "bold"),
plot.subtitle = element_text(hjust = 0.5, size = 12),
plot.caption = element_text(hjust = 0.5, size = 9, margin = margin(t = 10))
)Interpretation:
While the Free Wi-Fi Spots in Philadelphia are a very promising program to help bridge the gap for those without internet and a computer at home, a lot of the centers with computers are outside walking distance for the most digitally vulnerable in Philadelphia. There are 30 total tracts in Philly that completely fall outside walking distance to a Wi-Fi computer center, mostly concentrated in North Philadelphia. The estimated number of households in these 30 tracts without broadband is 103,418 and without a computer is 100,478. The average distance from these underserved tracts to the nearest free Wi-Fi and computer center is 7 miles.
The internet has become indispensable to most educational, workforce, social, and civic needs, and thus policy should primarily aim to enable every person to have home broadband and a computer. However, while that may be a long-term goal, free Wi-Fi and computer centers help bridge the gap. Thus they should prioritize locations as close as possible to the most digitally vulnerable areas of Philadelphia, seeing as a longer commute to these centers will compound the barriers already faced by not having broadband or a computer at home.
Finally - A few comments about your incorporation of feedback!
Taking feedback into account, I have hidden sensitive code blocks and hidden lengthy console output that is not essential to interpreting my work!
Submission Requirements
What to submit:
- Rendered HTML document posted to your course portfolio with all code, outputs, maps, and text
- Use
embed-resources: truein YAML so it’s a single file - All code should run without errors
- All maps and charts should display correctly
- Use
File naming: LastName_FirstName_Assignment2.html and LastName_FirstName_Assignment2.qmd